Unsupervised Discretization Using Tree-Based Density Estimation
نویسندگان
چکیده
This paper presents an unsupervised discretization method that performs density estimation for univariate data. The subintervals that the discretization produces can be used as the bins of a histogram. Histograms are a very simple and broadly understood means for displaying data, and our method automatically adapts bin widths to the data. It uses the log-likelihood as the scoring function to select cut points and the cross-validated log-likelihood to select the number of intervals. We compare this method with equal-width discretization where we also select the number of bins using the cross-validated log-likelihood and with equal-frequency discretization.
منابع مشابه
Unsupervised Discretization Using Kernel Density Estimation
Discretization, defined as a set of cuts over domains of attributes, represents an important preprocessing task for numeric data analysis. Some Machine Learning algorithms require a discrete feature space but in real-world applications continuous attributes must be handled. To deal with this problem many supervised discretization methods have been proposed but little has been done to synthesize...
متن کاملSpatial variability and estimation of tree attributes in a plantation forest in the Caspian region of Iran using geostatistical analysis
This research was conducted to investigate spatial variability and estimate tree attributes in a plantation forest in the Caspian region of Iran using geostatistical analysis. Sampling was performed based on a 50m?125m systematic grid in a maple stand (Acer velutinum Boiss) 18 years of age using circular samples of 200m2 area. Totally, 96 sample plots were measured in 63 hectares and 14.25 he...
متن کاملHigh-dimensional probability density estimation with randomized ensembles of tree structured Bayesian networks
In this work we explore the Perturb and Combine idea, celebrated in supervised learning, in the context of probability density estimation in high-dimensional spaces with graphical probabilistic models. We propose a new family of unsupervised learning methods of mixtures of large ensembles of randomly generated tree or poly-tree structures. The specific feature of these methods is their scalabil...
متن کاملDiscretization of continuous features in clinical datasets
BACKGROUND The increasing availability of clinical data from electronic medical records (EMRs) has created opportunities for secondary uses of health information. When used in machine learning classification, many data features must first be transformed by discretization. OBJECTIVE To evaluate six discretization strategies, both supervised and unsupervised, using EMR data. MATERIALS AND MET...
متن کاملEstimation of Tree Biomass at Individual tree, Sample plot and Hybrid Level using Drone Images
Two-dimensional image conversion algorithms to 3D data create the hope that the structural properties of trees can be extracted through these images. In this study, the accuracy of biomass estimation in tree, plot, and hybrid levels using UAVs images was investigated. In 34.8 ha of Sisangan Forest Park, using a quadcopter, 854 images from an altitude of 100 meters above ground were acquired. SF...
متن کامل